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Abstract 

Define a chi-square random field on a multi-dimensional lattice points index 
set with a direct-product covariance structure, and consider the distribution of the 
maximum of this random field. We provide two approximate formulas for the upper 
tail probability of the distribution based on nonlinear renewal theory and an integral- 
geometric approach called the volume-of-tube method. This study is motivated by 
the detection problem of the interactive loci pairs which play an important role in 
forming biological species. The joint distribution of scan statistics for detecting the 
pairs is regarded as the chi-square random field above, and hence the multiplicity- 
adjusted p- value can be calculated by using the proposed approximate formulas. 



By us ing these formulas, we examine the data of iMizuta. Harushima and Kurata 



(|2010l ) who reported a new interactive loci pair of rice inter-subspecies. 
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1 Introduction 

1.1 Tests of multiplicity in detecting loci interactions 

In genomic data analyses, genome scans for detecting loci that have some particular and 
interesting functions are often undertaken. These procedures are regarded as repeated 
statistical testings, and hence they are formalized as multiple testing procedures. In 
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such multiple testings, one crucial point is how to adjust the multiplicity of tests. This is 
because the method of adjustment seriously affects the interpretation of the data analysis. 

The detection of the interactive loci pairs assumed to exist in the Bateson-Dobzhansky- 
MuUer (BDM) model, which motivates our study, is such a genome scan problem. In 
biological concept, "species" are defined as "groups of interb reeding natu ral populations 
which are reproductively isolated from other such groups" (IMavii (119421 )). The genetic 
mechanism for separating species is called reproductive isolation, which is observed as hy- 
brid sterility or hybrid inviability between particular groups. The BDM model is a model 
for explaining evolution of genetic incompatibility genes. More precisely, the BDM model 
assumes that there exist pairs of loci such that when the loci have particular genotypes , 
sterility or inviability occurs and hence a descendant is not produced flDobzhanskvl fll95ll ). 
Coyne and OrrI (120041 )). In this paper, we refer to the interactive loci pair as the BDM 



pair. 

The importance of studying such interactive pair loci is widely acknowledged. How- 
ever, few studies have succeeded in identifying such pairs and in revealing the mechanism 
behind them. For the detection of BDM pairs, choosing two groups used for crossing is 
crucial but difficult. If parents are genetically separate, then descendants cannot be pro- 
duced. Conversely, if parents are too close, then sterility or i nviability cannot be observed. 
The detection of a BDM pair of Arabidopsis intra-s pecies bvlBikard. et al.l ( l2009l ). and the 
detection of a BDM pair of rice inter-subspecies by lMizuta. et al.l ( 120101 ) are exceptionally 
successful studies. 

The origirial pur pose of this paper is to give an answer to a statistical problem that 



Mizuta. et al.l (120101 ) have faced in the course of their studies. Figure ITTTl is the contour plot 
depicting scan statistics for detect ing BDM pairs in a 2 nd filial generation (F2) population 
from two rice subspecies used by iMizuta. et al.l (120101 ). The horizontal and vertical axes 
represent loci positions in 12 chromosomes of rice. Each scan statistic is a chi-square 
statistic with 4 degrees of freedom, and the number of statistics is around 500,000. Because 
of the large number of tests, some adjustment for the multiplicity of tests is necessary. 
The Bonferroni adjustments are frequently used in multiple testing. However, in our case 
where the statistics are highly correlated with each other, the Bonferroni adjustment that 
is calculated without information of correlation would lead to very conservative results. 

The multiplicity-adjusted p-value for correlated scan statistics is defined from the dis- 
tribution of their maximum. For calculating this distribution, we require knowledge of 
the correlation structure or joint distribution. This structure can be determined from 
experimental design in the case of crossing experiments such as the detection problem of 
BDM pairs. In particular, when the number of statistics is large and when the correlation 
structure is systematic, we can consider a large number of scan statistics as a random 
field and can obtain the distribution of the maximum. The distribution of the maximum 
of a random field (process) has been extensively studied. In this paper, the approaches 
we use are nonlinear renewal theory and the volu me-of-tube metho d (tu be method ) . The 
nonlinear renewal theory we use was developed by IWoodroofd (119821 ) and ISiegmundl (119851 . 
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Figure 1.1: Contour plot of chi-square statistics 



19881 ). In this method, a random field is locally treated as a random walk, and the dis- 
tribution of its maximum is obtained by using sequential analysis. The volume-of-tube 
method is an integral-geometric approach for approximating the distribution of the niax- 
imum of a Gaussian random f ield through evaluating the volume of the index set (ISun 
(1l993l ). iKuriki and Takemural (120011 . l2009l )). Mathematical ly, this is equivalent to apply - 
ing the Euler character istic heuristic to a Gaussian field (ITakemura and Kurikil (120021 ). 
Adler and Tavloil (l2007h ). 

This paper is organized as follows. In Section 11.21 we explain the scan statistics for 
detecting BDM pairs. Under the null hypothesis that a BDM pair does not exist, we see 
that the joint distribution of the scan statistics is regarded asymptotically as a chi-square 
random field with a direct-product covariance structure restricted on a lattice point index 
set. We also discuss other statistical problems that have the same stochastic structure 
as the detection of BDM pairs in Section 11.31 In Section [21 we formalize this chi-square 
random field in a general setting, and provide approximate formulas for its maximum 
distribution by using nonlinear renewal theory and the volume-of-tube method. Renewal 
theory assumes that the lattice points are equally spaced. This assumption may be un- 
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reasonable, because it implies that marker spacings are uniform. Hence, we use numerical 
comparisons to examine the difference between the randomly spaced case and the equally 
spaced case. The volume-of-tube method yields asymptotically conservative bounds by 
embedding the random field defined on a discrete set (i.e., unequally spaced lattice points) 
into a random field that has a continuous and piecewise smooth sample path. In Section 
[3l we analyze the data of iMizuta. et al.l (120101 ). They first screened the candidates of loci 
by analyzing datasets from two F2 populations and reciprocal backcross (BC) popula- 
tions, and finally succeeded in isolating causal genes of a BDM pair by positional cloning. 
We examine their data, and confirm that their genetic finding about the BDM pair is 
significant from the viewpoint of multiple testing procedures. The proofs of Proposition 
II. H which describes the asymptotic correlation structure of the chi-square statistics for 
detecting interactive pairs, and the tail probability formulas in Theorems 12.11 and 12.21 are 
given in Section HJ 



1.2 Scan statistics for the detection of interactive loci pairs 

In this subsection, we explain the scan statistic for detecting BDM pairs and its asympto tic 
joint distribution for the case of the F2 population dealt with bv lMizuta. et al.l (120101 ). 

We focus on the number of F2 individuals that avoided such a fatal event and grew 
up. Each locus of an individual in the F2 population produced by two strains A and B 
has the genotypes AA, BB, and AB. Abbreviating them to A, B, and H, respectively, 
the genotypes of loci 1 and 2 are cross-classified in Table 11.11 If this table shows some 



Table 1.1: Cross table of genotypes in two loci (F2) 



locus 1 \ locus 2 


A 


B 


H 


A 




^AB 


'^AH 


B 






'^BH 


H 




^HB 


^HH 



discrepancy against the independence of rows and columns, then the lack of individuals 
('sterility ] is assumed to have happened when the loci pair has particular genotypes. Not- 
ing this, iMizuta. et al.l ( I2OIOI ) used the chi-square statistics for independence (Pearson's 



chi-square s tatist ics) as scan statistics for detection. Similar scan statistics are used by 



Kao. et al.l ( I2OIOI ) in an Fi spore population from an inter-species cross of yeast. 



Let T'c^c2(ii! J2) (ci < C2) be the chi-square statistic calculated from the pair of the 
marker ji on chromosome Ci and the marker j2 on chromosome C2. The multiplicity- 
adjusted p-value can be obtained from the upper probability of the maximum of all chi- 
square statistics maxci<c2 maxj^ TcjC2(ji5 j2) under the null hypothesis Hq that a BDM 
pair does not exist. The distribution of each statistic T'cjC2(ji) J2) is approximated as the 
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chi-square distribution with 4 degrees of freedom when the number n of individuals is 
large. However, these statistics are not independent and are highl y correlated because 



of the linkage. Under the assumption of Haldane's model (see, e.g., ISiegmund and Yakir 



(120071 ). Section 5.6), which is the most standard model for linkage, the joint distribution 
under the null hypothesis Hq is described in Proposition 11.11 below. The proof is given in 
Section 14. 1[ 

Proposition 1.1. (a) Let dij^ (M: Morgan) he locations of markers ji (= 1, . . . ,mi) on 
a chromosome (chromosome 1, say). Let d2j2 be locations of markers j2 (= 1? • • • ? ^2) on 
another chromosome (chromosome 2, say). Under the null hypothesis that a B DM pair 
does not exist, as the total sample size n goes to infinity, convergence in distribution 

Ti2(ji, J2) J2)' + ^2(ji, J2)' + Zs{juj2y + ^4(ji, J2)' {n ^ 00) (1.1) 

holds jointly for all (ji, J2); where Zi, . . . , Z4 are independent, and for each k, the Zk{ii,i2) 's 
are distributed according to the multivariate normal distribution with a marginal mean 0, 
a variance 1, and the following covariance structure: 

Cov(Zfc(ii,«2),^fc(ji,j2)) = e-^'=il''"i-'^^«' X e-''^2l'^2'2-'^2.2l (1.2) 

with 

/(2,2) (<:=1), (2,4)(«: = 2), 

(b) Under the null hypothesis that a BDM pair does not exist, TcjC2 ^'^^ -^44 ^'^^ 
asymptotically independently distributed unless (ci,C2) = {c[,c'2). 

This proposition does not tell us about marker pairs belonging to the same chromo- 
some. When two markers are located on the same chromosome, the linkage affects the 
independence of the rows and columns in Table II. 1^ and the chi-square statistic simply 
measures the effect of the linkage directly. Because this is irrelevant to the reproductive 
isolation, we ignore such pairs. 

Based on the asymptotic distribution given by Proposition II. H we can evaluate the 
multiplicity-adjusted p- value (see fl3.ll) ). In this context, calculation of the upper prob- 
ability of the maximum of a chi-square random field on lattice points is crucial. The 
primary theoretical purpose of this paper is to provide approximate formulas for upper 
tail probability in a more general setting. 



1.3 Other examples 

The covariance structure in Proposition 11.11 also appears in other scan statistics. We 
illustrate two examples briefly. 

The first example is the detection of epistasis in quantitative trait loci (QTL) analysis. 
In QTL analysis for F2 population, phenotype y and genotypes Zj are observed for each 
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individual, where j is the index of markers, and zj takes the values A, B, and H. The 
following is a simple model of QTL analysis incorporating the effects of epistasis between 
a loci pair (ji,j2): 



y 



where Vj = 1 (zj = A), = (zj = H), = -1 (zj = B),Wj = l (zj = A, B), = -1 {zj = H), 
and e is a Gaussian measurement error. The parameters 71, . . . , 74 represent the epistasis. 
For identifying the loci pair (^1,^2), the scan statistic f/(ji,j2) defined as the likelihood 
ratio test (LRT) statistic for testing the null hypothesis of no epistasis 71, ... ,74 = is 
used. It is shown that the asymptotic joint distribution of {f/(ji, j2)} is the same as that 
of {T(ji,j2)} in Proposition 11.11 when ji and j2 are on different chromosomes, and the 
multiphcity-adjusted p- value can be obtained similarly. 

The second example is the detectio n of a change-p oint in two-way ordered categorical 
data. For the cell probability {pijjaxb, iHirotsul (119971 ) assumed a log-linear model with a 



change-point at (io, jo): 

\ogpij = ai + 13 j + -fl{i < io,j < jo), 

where ll(-) is the indicator function, and define a scan statistic V{io,jo) as the LRT statistic 
for testing 7 = 0. Under the null hypothesis, {^^(io, jo)}io=i,...,a,jo=iv,6 i^ asymptotically 
equivalent to {Zi(ji,j2)^} in Proposition 11.11 with dij = logj^-, d2j = logj^-, Pj = 

Ylk=i YM=iPkh Qj = J2k=i J2i=iPkh and multiplicity-adjusted p- value can be obtained in 
our framework. 



2 Approximate tail probabilities 

2.1 Chi-square random fields restricted on lattice points 

In this section, as a generalization of the random field referred to in Proposition II. 1^ we 
define a chi-square random field on a mult i- dimensional index set with a direct-product 
type covariance structure such as (11. 2p . and consider the distribution of its maximum over 
a mult i- dimensional lattice points. 

For /c = 1, . . . , m, let us consider a real-valued continuous Gaussian random field on 
MP that has the following moment structure: 

E[Z,{t)] = 0, V[Zk{t)] = 1, Cov(Zfc(t), Zk{t')) = Rk{t - t'), 

where for /i = (/ii, . . . , hp), 

p 

Rk{h) = Y[Rki{hi) = 1 - pki\hi\+o{\hi\) ashi^O, (2.1) 

i=l 
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and pki is a positive constant. In particular, when Rki{hi) = e~^''^^^^^, this expression 
represents the direct-product covariance structure of the stationary Ornstein-Uhlenbeck 
process. Zi, . . . , Z^ are assumed to be independent. Moreover, define 



Zit) = {Z,it),...,ZUt)), Yit) 



.Y.Z,itr- (2.2) 

\ k=l 



y(t)^, t = (ti, . . . , tp) G is a chi-square random field whose marginal distribution is 
the chi-square distribution with m degrees of freedom. 

For i = 1, . . . , p, let = dio < dil < ■ ■ ■ < di^ be distinct points, and let Tj = {dio (= 
0), dil, ... , din^}. Define a p-dimensional unequally spaced lattice point set 

T = TiX ■■■ xTpCW. 

In this section, we provide an approximate formula for the tail probability of the maximum 
of the chi-square random field Y restricted on the discrete set T: 

p(maxY{t)>b) asfo^oo. (2.3) 



2.2 Approximations based on nonlinear renewal theory 

In this subsection, we study large-deviation approximations for the distrib ution of the 



(1982 


) and 


Sieemund 


(1988) 



maxjgT^(^) can be approximated by the maximum of a suitably defined random walk 
when Y is large and the spacing of lattice is small. We then to evaluate the distribution 
of its maximum with the help of sequential analysis. 

A drawback of the method is that the index set T must be an equally spaced lattice 
point set. That is, for all i, the points dio < ■ ■ ■ < dir^ belonging to Tj are assumed to be 
equally spaced as 

dil - diQ = ■ ■ ■ = din, - din.^i (= A, say). 

If the spaces are not equal, the random walk in the limit does not approach the sum of 
identical distributions, and hence one cannot utilize the reproductivity in the sequential 
analysis. However, as we show in Section 12.41 in typical settings for genome analysis, the 
upper probability for the maximum on unequally spaced lattice points is bounded above 
by that for the maximum on the equally spaced lattice (i.e., the latter gives a conservative 
bound for the former), and the difference between them is not substantial. 
Define a bounded rectangle in by 

f = fi X ■ ■ ■ X fp C = [0, din,]. 

For 

J = Uu ...,Jp)e D = {Di, ...,Dp)e (2.4) 
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we write jD = (ji-Di, . . . ,jpDp). Our problem is to approximate the distribution of the 
maximum on p- dimensional lattice points whose spacing in the ith coordinate is Di as 
follows: 

P (max Y{jD) > , J = | j G Z™ | G f | , as 6 ^ oo. 

By using the approach of nonlinear renewal theory, we can obtain the following for- 
mula. The proof is given in Section 14.21 

Theorem 2.1. As b ^ oo, Di ^ such that by^i Ci E (0, oo), i = 1, . . . ,p, 

pfmaxYijD) > fe) ~ J1^6-+2p-2g-.V2 f JJp^^(6^/^) du, (2.5) 

where du is the volume element of the unit sphere E>"^~^ in atu = . . . , Um) G S"*^"*^, 

m 

Pi = Pi{u) = ^ulpki, (2.6) 
fe=i 

|r| is the Lebesgue measure ofT, and 



2x-2exp|-2^,~ ^n-i$(|-ixv^)} (x > 0), 
1 (x = 0) 



with $(■) the cumulative distribution function of the standard normal distribution. 

Remark 2.1. The function z/(x) can be conveniently approximated by the following: 

^ (2/x)($(x/2)-l/2) 

(x/2)$(x/2) -0(x/2)' ^ ' 



where (/>(■) is the density function of the standard normal distribution (iSieamund and Yakir 



1(2001) ). We use this in numerical calculations presented in Section 2.4 



Remark 2.2. The upper tail probability of the maximum of a conti n uous chi random field 



Y over a continuous set T can be obtained by following \Piterbara /(lOOd) . Corollary 7.1 
as follows: 

pUaxYit) >b) ^ ^1^6-+2p-2^-6V2 I JJ^^(^) ^ (2.8) 

^ t&T / (27r)™/^ J§m-i ^J^ 

This is coincident with the right-hand side of \2. 5\) with Ci = 0. Since maXf^TYlt) < 
maXjgyF(t), /i2.8\) is an asymptotic upper bound for /i2.3\) . 
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2.3 Approximations based on the volume-of-tube method 



In this subsection, we provide a conservative bound for the distribution of the maximum 
of a chi-square random field (12. 3p by adopting an integral-geometric approach referred to 
as the volume-of-tube method or the Euler characteristic heuristic. 

The volume-of-tube method approximates the distribution of the maximum of a Gaus- 
sian random field that has a continuous and piecewise smooth sample path. It is particu- 
larly usefu l when the marginal distribution (with a fix e d index) is standard no r mal N (0, 1). 
(See. ISunI (119931). iKuriki and Takemural (120011 . |2009| ). iTakemura and Kurikil (120021 ). and 
Adler and Taylorl (|2007| ).) In order to apply the volume-of-tube method to our problem, 
we need to describe our problem in terms of a Gaussian random field with a continuous 
and piecewise smooth sample path. 

First, we modify the Gaussian random field Z/^ on a discrete set T to define a Gaussian 
random field Z^. on a continuous set T that has the following properties: 

(a) Zk{t) = Zk{t) (if tGT). 

(b) As a function of t G T, Zk{t) is continuous and piecewise smooth. 

Note that continuous processes with the covariance structures given by (12. ip do not satisfy 
(b). This is because the covariance function is not differentiable aX h = 0, and hence the 
sample path is not differentiable everywhere. 

Define a chi random field on the index set T by 



F(t) 



\ k=l 



In addition, define a Gaussian random field on the index set T x ^ by 

m 

X{t,u) =J2^kZk{t), U={Ui,...,Um) e§"-^ 



fc=l 



Since Y(t) = Y(t) = max„g§m-i X(i(:, -u) for t G T, we can use the upper probability of 
maXjg^F(t) = maXj^j ^jg^^g„_i X(t, m) as a conservative bound for that of ma.XtfzT Y (t) . 

Note that X{t,u) with {t,u) fixed has a standard normal distribution. 

Under the volume-of-tube method, the index set T x S"*"^ is regarded as a Riemannian 
manifold endowed with a metric of 



g{t,u) = Cov{V^t,u)X{t,u),V^t,u)Xit,u)) 



(2.9) 



at {t,u). When a positive definite metric can be defined by (12.90 . approximate tail prob- 
ability formulas can be obtained as asymptotic expansions involving geometric invari- 
ants measured by this metric. However, even when the index set contains singularities 
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where the metric is not properly defined, if the volume Vol(T x ^) of the index set 
can only be evaluated by integrals over r egular sets, the leading-term formula given be- 



low applies (ITakemura and Kurikil (120031 )). Note that the dimension of the index set is 



dim(T X S'"-^) = p + m - 1. 

p(maxF(t) >b] = pI max X{t,u) > b] 

~Vol(f X §"^-1) ■ — — -&P+'"-2e-^'/2 (b^oo). (2.10) 

There is no unique way of constructing a Zf^ satisfying (a) and (b) from Z^. We 
construct Z^ by undertaking the following steps. 

(i) Dissect the p-dimensional rectangle whose vertices are fianking lattice points of T, 

(iijj X ■ ■■ X [dpjj,-i,dpjj, 

into p\ simplices. 

(ii) For each simplex, define Z^ over the simplex by linearly interpolating the values of 
Zk at vertices and multiplying by a scalar so that the variance of Z^ at each point 
of the simplex is 1. 

Details of the proof of the next theorem and details of how to construct Z^ are given in 
Section 14.31 

Theorem 2.2. Let Dij = dij — dij^i. As b ^ oo and max Dij — )■ 0, 

p(inaxF(t) > 6) < p(maxr(t) > ~ (2vr)(^+.)/^ ^"^^'"^"'^''' ^'-^'^ 

where 

p n 



i=i j=i •'^ 1=1 

and Pi{u) is defined in Ii2.6\) . In addition, du is the volume element ofE>"^~^ at u. 



Ninomival fl2004l ) provided a conservative bound for the upper probability of the max- 
imum of a Gaussian random field on a 2-dimensional lattice with a product-type co- 
variance structure (12. ip in detecting a change-point in two-way ordered categorical data. 



Rebai'. et al.l ( 119941 ) also applied the volume-of-tube method to linkage analysis. He com- 
puted thresholds for the maximum log odds (LOD) score in the interval mapping method 
by using Rice's formula, which is essentially equivalent to the volume-of-tube method. 
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2.4 Numerical comparisons 



In this subsection, we make numerical comparisons of three approximations: the formula 
based on nonlinear renewal theory (Theorem I2.ip : the conservative bound based on con- 
tinuous processes (Remark I2.2p : and the conservative bound based on the volume-of-tube 
method (Theorem 12.21) . Mindful of the problem of detecting the interactive loci pairs 
(BDM pairs), as explained in Section [H we set the parameters as follows: The dimension 
of the index set is p = 2, the chi-square degrees of freedom is m = 4 and 1, {pki, Pk2) 
{k = 1,2,3,4) are in ([O]), = n2 = 50, 100, 200, D^j = = 0.2/100, 1/100, 5/100 
(equally spaced), {Dij)j>i = (-D2j)j>i = (0.5, 1, 0.5, 1, 3, 0.5, 1, 0.5, 1, 1, . . .)/100 (repeat 
the cycle with period 10) (pattern I), (£'ij)j>i_= (^2i)j>i = (0.5, 0.5, 3,^.5, 0.5, .. .)/100 
(repeat the cycle with period 5) (pattern II), T = [0, 0.5]^, T = [0, 1]^, T = [0, 2f. Note 
that the length 1/100 corresponds to IcM on a chromosome. 

Let U = [Ui, . . . , Um) be a random vector with a uniform distribution on the unit 
sphere S"^"^ in M™'. An integral over S*""^ with respect to the volume element du 
can be replaced by the expectation J^„,_, f{u)du = Vol(§™-i) E[/(f/)], Vol(§'"-^) = 
27r™/Vr(m/2). In particular, we use the following for m = 4: 



E 



E 



=1 PklPk2 



m{m + 2) 



9, 



•1=1 



2.971. 



Moreover, we use the approximation (12. 7p in calculating the special function z/(x). 

Figure 12.11 illustrates the comparisons among three approximate formulas as well as 
empirical distributions of Monte Carlo simulations for the probability P (maxti=.T Y (t)'^ > 
6^) . Random numbers are generated from the following spatial autoregressive model: For 
k = 1, . . . ,m, i = 0,1, . . . ,ni {= 100), j = 0, 1, . . . , ?t,2 (= 100), let ek{i,i) be independent 
standard normal distributed random variables. Generate Zk{i,i) sequentially according 
to 



^fc(0,0) = £fc(0,0), 

Zk{i,0) = ak{i)Zk{i - 1,0) + a/1 - ak{iY ek{i,0) 
^fc(0,j) = /3fc(j)^fc(0, J - 1) + - /3fc(jT^fe(0, j) 
Zkii,]) = ak{i)Zk{i - 1, j) + l3kij)Zk{i,j - 1) 
-akii)l3kij)Zkii - 1, j - 1) 
+ ^l-ak{t)W^'Pk{jyek{t,j) 



> 1), 
(j > 1), 



(2.12) 



where 



Then, 



PkiDu 



ak{i) = e 
maxY{i,jY 



ij>0 



max 

i,j>0 



4 

^Zk{i,j 



Pk2D2j 



k=l 
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is obtained. In these figures, the transformed upper probabilities of the three approximate 
formulas by using the t ransfo rmation x ^ 1 — are depicted. This map is adopted by 



Dupuis and Siegmundl (Il999[ ). (9), to restrict the maximum p- value to less than 1 without 



altering the asymptotic behaviors of the tail probabilities. 

(a)-(c) in Figure 12.11 show that the formula based on nonlinear renewal theory ap- 
proximates the tail probabilities well in wide ranges of the marker spacing, length of 
chromosomes. In particular, the case where the degree m of freedom is 1 shows greater 
accuracy than when m = 4. The formulas based on the volume-of-tube method and the 
continuous process yield upper bounds for the upper probabilities . Compared with the 
formula based on continuous process, the volume-of-tube method is more practical because 
the latter is less conservative, (d) of Figure 12.11 shows that the statistics for unequally 
spaced samphng are slightly below those for equally spaced sampling. This suggests that 
the formulas for equally spaced lattice lead to conservative p-value estimators when the 
sampling spaces are unequal. 



3 Detection of interactive loci pairs 
3.1 Data analysis for the F2 population 



As we explained in Section [H iMizuta. et al.l ( I2OIOI ) conducted a genome scan of all pairs 



of marker loci of F2 individuals of rice by using chi-square statistics for independence. In 
this section, we reexamine the data from the viewpoint of multiple testings. 

Rice has 12 chromosomes, and their total length is around 1600cM. Two strains of 
rice used to produce the F2 population are Nipponbare and Kasalath. Nipponbare is 
a short-grained rice in japonica variety, and Kasalath is a long-grained rice in indica 
variety. These two types have contrasting characteristics, and hence are used often in 
QTL analysis. By using Kasalath pollen, the Fi population was produced. The F2 
is an offspring resulting from the self-pollination of Fi individuals. The data comprise 
genotypes of 994 codominant markers at diff erent locations co v ering the whole genome 



for n = 186 individuals of the F2 population (IHarushima. et al.l ( 1l998l )). 

Figure ITTT] is a contour plot of chi-square statistics calculated from all (^2^) = 500,000 
marker pairs. Because of linkage, the statistics are highly positively correlated, and large 
values tend to appear in neighborhoods of the "high peak" . (As stated in Section [H 
marker pairs on the same chromosome take large values. Because these values simply 
measure the linkage, we ignore them.) 

Table 13.11 shows the highest 20 peaks that do not to seem to be caused by the linkage 
effect. The maximum chi-square statistic is 

max maxTc^caO'i, j2) = 33.6 

l<Ci<C2<12 ji,j2 

observed between markers on chromosomes 9 and 12. This corresponds to a p- value 
of 0.9 X 10~^ for a chi-square distribution with 4 degrees of freedom, which is highly 
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significant if we do not take tfie multiplicity of tests into account. However, because of the 
high number of observed statistics (around 500,000), some adjustment for multiplicity is 
required. The Bonferroni-adjusted p- value for the maximum value is 0.9 x 10~® x 500, 000 = 
0.45. However, this is conservative because the Bonferroni adjustment does not take into 
account the highly positive correlations 



Table 3.1: The largest 20 chi-square values 



No. 


Marker 


Chr 


(cM) 


Marker 


Chr 


(cM) 


Chi-square T 


1 


R1683 


9 


94.1 


S10637A 


12 


13.4 


33.6 (2.9) 


2 


P130 


6 


54.0 


S12886 


11 


116.1 


33.2 (7.1) 


3 


V163 


5 


71.1 


S11447 


12 


95.9 


26.2 (1.2) 


4 


S2074 


9 


57.4 


S10906 


10 


2.0 


23.8 (7.2) 


5 


P60 


3 


92.1 


S2572 


12 


26.5 


23.3 (3.1) 


6 


Y5714L 


1 


69.1 


R3203 


1 


160.0 


21.7 (3.9) 


7 


S1046 


1 


161.9 


C946 


4 


10.4 


20.9 (2.9) 


8 


VlOA 


3 


2.5 


V133 


8 


107.0 


20.7 (6.1) 


9 


C191A 


1 


141.9 


C1219 


3 


157.1 


20.6 (1.7) 


10 


P61 


1 


181.7 


R2965 


10 


2.3 


20.5 (5.9) 


11 


S11214 


1 


45.6 


S1520 


6 


15.2 


20.0 (21.1) 


12 


G55 


3 


34.4 


P126 


6 


39.6 


19.8 (7.6) 


13 


S1046 


1 


161.9 


G267 


4 


111.2 


19.8 (4.3) 


14 


R3192 


1 


26.9 


C922A 


1 


121.0 


19.7 (3.0) 


15 


R19 


3 


98.2 


G7004 


4 


72.3 


19.5 (9.3) 


16 


P60 


3 


92.1 


C1424 


6 


112.1 


19.3 (3.8) 


17 


R2625 


1 


155.3 


S851 


3 


150.1 


19.2 (2.3) 


18 


C506 


9 


93.0 


Y1053R 


10 


34.6 


19.1 (3.8) 


19 


S 10879 


9 


94.4 


C496 


11 


30.3 


19.0 (2.8) 


20 


C2523S 


7 


8.8 


S2545 


12 


72.5 


19.0 (1.7) 



^ Figures in parentheses are chi-square T's in the second experiment. 



When we consider a particular chromosome pair, say (ci,C2), the statistics Tc^c2{jii 32) 



(ji = 1, . . . , rici, j2 = 1, ■ ■ ■ , ^1-02) have the correlation structure described in Proposition ll.il 
(a). Hence, the asymptotic null distribution of the maximum for pairs on the chromosome 
pair (ci,C2) can be evaluated. Furthermore, noting Proposition 11.11 (b), which states 
that statistics on the different pairs of chromosomes are asymptotically independent, we 
can evaluate the multiplicity-adjusted p-values for the maximum statistics over whole 
chromosomes as follows: 

p-value = F( max maxTc,c2(ji, J2) ), (3.1) 

\1<C1<C2<12 jl,j2 / 

Fix) = l- n {l-P{ max Y{ti,t2f > x\V 



l<ci<C2<12 
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where F is a chi random field defined in (12 ■2p with p = 2, m = A, and pki in (11 .Sp . The 
locations (M) of markers on chromosome i are denoted by Tj = {rfjo, ■ • • , din^}. 

The multiphcity- adjusted p-value (13. ip for the maximum chi-square of 33.6 was es- 
timated as 0.068 (Monte Carlo), 0.104 (renewal theory), and 0.240 (tube method). In 
applying Theorem 12.11 we substituted the average of the marker spacing on chromosome 
i for Di. All of the peaks listed in Table 1371) were not significant at 5%. 

In the Monte Carlo method, random variables were generated from the recurrence 
relations in (12.121) . Computational time was 14 days and 8 hours for 10,000 iterations 
using a supercomputer SGI Altix3700 and the R language. 

Remark 3.1. In QTL analysis, permutation te sts are commonly used for es timating the 



null distribution of the maximum LOD scores /(Churchill and Doerga (il99ji)). For our 
problem, we can propose the procedure described below: The data set of the genotypes of 
all individuals is denoted by V. Let 11 be the set of all permutations of individual numbers. 
Repeat steps (i)-(ii). 

(i) Choose a permutation vr from 11 at random. Let be the data set T> with their 
individual numbers relabeled by the permutation vr. 

(ii) Make cross-classified tables between all markers ofD and all markers ofV^^ by their 
genotypes (i.e., in Table lTT\ locus 1 is taken from V, and locus 2 is taken from Vt^), 
calculate the chi-sguare statistics from the tables, and find their maximum. 

The null distribution of the maximum chi-sguare statistics can be estimated as the empir- 
ical distribution of the maxima obtained in (ii). 



However, the method referred to in Remark l3. ll requires at least as much computational 
time as that required for Monte C arlo. 



Moreover, iMizuta. et al.l ( 120101 ) performed additional genome scan searches for another 



F2 population of a similar sample size. The chi-square statistics corresponding to the peaks 
detected in the initial experiment are listed in the last column of Table 13.11 Except for 
peak No. 11, all other peaks in Table [3?T] showed low values of the chi-square statistics in 
the second scan. 

3.2 Data analysis for the BC population 



Furthermore, iMizuta. et al.l ( I2OIOI ) carried out an additional experiment using the recipro- 
cal BC population to Nipponbare. This experiment can distinguish where the interaction 
occurs, i.e., male gametophyte, female gametophyte, or zygote. They selected 159 markers 
including those exhibiting large chi-square values in the F2 data analysis, and examined 
the genotypes of all pairs of these selected markers in the BC populations. 

Compared with the F2, the types of BDM pairs that can be detected from the BC 
population are hmited. On the other hand, the detection power (the power function of 
test) for detectable pairs is expected to be higher. 
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The BC population is the experimental crossing population produced by crossing strain 
A with the Fi made from strains A and B. Note that there is some arbitrariness about 
whether the Fi is used as the maternal parent or pollen parent. The set of two BC 
populations corresponding to these two cases is called the reciprocal BC. Only genotype 
AB is observed in the Fi population. Two types of genotypes, AA and AB, are observed 
in the BC population. We abbreviate these two genotypes to A and H, respectively. 
The genotypes of two loci 1 and 2 are cross- classified as shown in Table 13.21 The chi- 
square statistic for independence obtained from this table has an asymptotic chi-square 
distribution with 1 degree of freedom under the null hypothesis that there exists no BDM 
pair. 

Table 3.2: Cross table of genotypes in two loci (BC) 

(The table attaining at the maximum chi-square is shown in parentheses) 



locus 1 (Chr 6 S1520) \ locus 2 (Chr 1 SI 1214) 


A 


H 


A (Nipponbare) 


^aa(75) 


riAH (13) 


H 


riHA (64) 


nun (83) 



The 2x2 table showing the maximum value of the chi-square statistics is given in 
Table 13.21 (in parentheses). The maximum value is 39.6, which was observed between 
chromosomes 1 and 6 in the BC population with the Fi pollen parent. The sample size 
was n = 235. This is the loci pair listed as No. 11 in Table 13711 In another BC population 
with the Fi maternal parent, no significant peak was observed. 

In order to obtain the multiplicity-adjusted p-value for this maximum value, we need 
the joint distribution of the chi-square statistics. In the BC case, we can prove a propo- 
sition similar to Proposition 11.11 Part (a) of Proposition 11.11 holds if convergence in law 
(11. 1|) is replaced with the convergence 

^12(71,^2) Zi{ii,i2f in ^ 00). 

Part (b) of Proposition 11.11 holds as it is. 

The multiplicity-adjusted p-value is 2.86 x 10~^ (renewal theory) and 1.57 x 10~^ 
(tube method). In either case, it is highly significant. This suggests that this pair is 
a candidate of the BDM pair that we a re seeking for and t hat the selection occurred 
in male gametophyte, pollen. Actually, iMizuta. et al.l (120101 ) confirmed that the male 
gametophyte selection of the unbearable genotype combination of the true BDM pair 
occurred through failure of pollen germination, and the reciprocal disruption of duplicated 
genes in the two strains caused the BDM incompatibility. Note that no other significant 
peaks were detected. 

Finally, we discuss why the interaction was not detected in the F2 but was in the 
BC. As explained in Section [4.11 (see Lemma [4.11 and succeeding descriptions), the chi- 
square statistic with 4 degrees of freedom obtained from Table 11.11 can be asymptotically 
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decomposed into four chi-square components each with 1 degree of freedom. One of the 
four components corresponds to the chi-square statistic obtained from Table 13^ However, 
in producing the BC population, there is some arbitrariness about whether Fi is used as 
mother or father, and both cases are assumed to be included in the F2 population each 
with a probability 1/2. Since the sample sizes for the F2 and BC data were similar (around 
200), if there was no other significant component except for the one component with 1 
degree of freedom detected in Table 13.21 (in parentheses) . it is convincing that the chi- 
square statistic of 20.0 (Table [3711 No. 11) in the F2 is almost half of that of 39.6 in the 
BC population (pollen parent is Fi). In conclusion, although the chi-square statistic with 
4 degrees of freedom obtained from F2 has statistical power in many directions, larger 
sample size was needed to detect the BDM pair. 



4 Proofs 

4.1 Proof of Proposition 11.11 

First, we provide asymptotic presentations of chi-square statistics for independence when 
the independent model is true. Let X = {xij)axb {x.. = n) be a contingency table 
distributed as a multinomial distribution with the cell probability {pij)axb {p - = !)• Here, 
we apply the convention that the summation with respect to an index is denoted by 
The chi-square statistic for the hypothesis of independence Hq : pij = Pi.p.j is denoted by 



T = T{X) = J2 



Oiij^.X.j I Til 



The proofs of the following lemmas are easy and omitted. 

Lemma 4.1. For a 3 x 3 table X = {xij)i<ij<3, define four 2x2 tables: 

_ fxil Xi2\ ^ _ (^11+ ^12 Xi3\ ^ _ f^ll+ ^21 X12 + X22 
^ \X21 X22) ' ^ \X2l + X22 X23) ' ^ \ X31 X32 



X, 



Xll + Xi2 + X21 + X22 Xi3 + X23 
X31 + X32 X33 

Under Hq, four statistics T{Xi),T{X2),T{X3),T{X4) are asymptotically distributed ac- 
cording to the independent chi-square distributions with 1 degree of freedom, and it holds 
that 

T{X) = T(Xi) + T{X2) + r(X3) + r(X4) + 0.,{n-^/^). (4.1) 
Lemma 4.2. For a 2 x 2 table X = {xij)i<ij<2 with the cell probability {pij)i<i,j<2, 

nx) = -(t^i-iy^\l^^^^^x^ +o,(n-v2) (42) 

holds under Hn. 
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For the F2 individuals t = 1, . . . , n made from two strains A and B, by cross- classifying 
the genotypes of marker i [i = 1, . . . , m) on chromosome 1 and marker j (j = 1, . . . , m) 
on chromosome 2, we have the 3x3 tables represented by Table ll.li Let Tjj be the 
chi-square statistic obtained from the table for marker pair (i, j). 

For individual t, let ef ^ be the genotype of locus i on chromosome 1 inherited from 
its mother, and let be that from its father. Let 'ef' be the genotype of locus j on 
chromosome 2 inherited from its mother, and let be that from its father. We let 



M At) -it) 7it) 



1 (from strain A), 
-1 (from strain B). 



Then, the 4n random vectors [e? , . . . , e^^) , {6? , . . . ,6ii^), {4\ . . . ,^^), (^f ^ , • • • , 4^) , 
t = l,...,n are independent of each other, and all elements take the value ±1 with 
probabilities 1/2 and 1/2 with the correlation structure 

E \ef^4^] = E \5f^5^] = e-^'^"' , E [^f^^)] = E [6f^^] = e-'^^^' . 

We assumed Haldane's model as the linkage structure. The genetic distance between 
markers i and i' on chromosome 1 is denoted by dui (M), and the genetic distance between 
markers j and j' on chromosome 2 is denoted by djj> (M). 

Using this notation, the 3x3 table represented by Table 11.11 can be rewritten as 



nAA nAB nAu 

nBA riBB ^BH 



t=l 



x(i(i+f)(i + ?^) 

In order to derive the joint distribution of the chi-square statistics Tij, we decompose the 
3x3 table into four 2x2 tables (i)-(iv) according to Lemma 14.11 

(i) Table . The sum of the expected frequencies is n/4. From f l4.2D . the 

V'^BA "-BB/ 

corresponding chi-square statistic has the asymptotic representation 

Ti^ij = -^{riAA - nAB - riBA + ^bb)^ + Op(n"^/^) 
n/4 



2 



(u) Table . The sum of the expected frequencies is n/2. The 

uba + nBB nBH 
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corresponding chi-square statistic has the asymptotic representation 

T2,ij = {{riAA + nAB) - "-AH - ('^BA + "-bb) + ^bh)^ + Op{n'^^^) 
n/2 

n \ 2 

(iii) Table ( + ^ba ^ab + '^bb \ r^^^ ^j^^ expected frequencies is n/2. 



The corresponding chi-square statistic has the asymptotic representation 

T3,ij = ^ ((nAA + '^ba) - ("-AB + ^^bb) - ?^HA + ^hb)^ + Op{n''^^^] 

2 



n 

t=i 

,. , / uaa + nAB + nBA + nBB nAn + ^bhA m, r , i , i r 

(iv) iable . i he sum oi the expected irequen- 

V njiA + nuB riuH J 

cies is n. The corresponding chi-square statistic has the asymptotic representation 

1 2 
T4,ij = - {{riAA + nAB + nBA + ^bb) - (nAH + ^bh) - (nHA + ^hb) + nnn) 

2 

%ij — H "j ■ 



^klj ~ 2) 3, 4) has a mean and a covariance structure 



Part (a) of Proposition 11.11 follows from the central limit theorem and the continuous 
mapping theorem. 

When markers i and i' are on different chromosomes, or markers j and j' are on 
different chromosomes, we can let da' = oo or djji = oo. In each case, E\z1\-zf) ^i-/\ = 
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for all k and k'. This implies that the statistics Tij and Tj/j/ are made from random 
variables whose limiting distributions are independent Gaussian, and hence, part (b) of 
Proposition 11.11 follows. 



4.2 Proof of Theorem [23 
4.2.1 Proof of dMD 

By arranging the index set J in the lexicographic order, we can let = . . . , G J 
be the first point such that the random field Y{jD) takes a value of at least h. Let 

or Ji = Ji, J2 > J2, 



or 



h = 3u • • • , jd-i = jd-i, id > id]- 



Let ^ be the unit sphere in W^. Let du be its volume element at m G §™ ^. Let 
dy = iy,y + dy). 

The event {max^pj F (fZ) ) > 6j is exclusively divided by the value of j° G J (see, e.g.. 



Dupuis and Siegmundl f l2000l ). (15)) as 
p(^maxY{jD) > 

= [ [ P( max Y(jD) < b I Z(fD) = yu] 



xP{Y{fD)edy, |^Grf« 



Y{fD) 

In the last expression, we made change of variable y = b + x/b. 
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For fixed Zk{fD) ~ A/'„(0, J^), and hence Y{fD) ~ Xm and 
Z {j^ D) / Y {j^ D) ~ Unif(S'"~^) are independent. Therefore, 

{x,x + dx)\ Z{fD) 



plY{fD)e {b 



G du 



YifD] 

p(Y{fDye (^{b + x/bf,{b + x/bf-2dx^^ X 



du 



voKS"^-!; 



2W2r(m/2) 
Moreover, as shown later. 



X 



Vol(§"-i) ' 



(4.4) 



/ pI max Y{jD) < b \ Z{fD) = yu]dx ~ T\ piC^iV{ciJ2pi 
Jx>o \j&J"iJ") J V 



(4.5) 



{y = b + x/b, Pi = pi{u) is in ([2S])). 

By substituting flOl) and fH3]) into and noting that Hi -^i ZljOgj ~ jf Hi ^'^i 
|f I, VoKS™-!) = 27r'"/Vr(m/2), we obtain 



pfmaxr(jX)) > 6 



ITI 



n.A (27r) 



) j§™-i 



This means (12.51) . 



4.2.2 Proof of (ITO 



We us e the large-deviation approach developed by lSiegmundl fll988l ). See also lKim and Siegmund 
fll989h . 

Suppose that t is fixed. Under a conditional probability measure given Z(t) = 
{Zk{t))i<k<rn = ^ = (6)i<fc<m, the R'^-valued random field Z{t + h) = {Zk{t + /i))i<yt<m 
with the index h = (/ij)i<i<p is a Gaussian random field with a mean of 



E[Zk{t + h)\^]=Rk{h)^k 



and a covariance function of 



Cov(Z,(t + /i),Zfc,(t + /i')|0 



Ru{h-h')-Rk{h)Rk{h') ik = k'), 
(A; 7^ A;'). 



When hi is small, these moments can be rewritten as 

p 



E[Zk{t + h)\^]=^k-^kJ2p'^i\^i\+^'^^(\^\'>^ 

i=l 

P 

Cov(Zfc(t + h), Zkit + h')\o = J2 pUM + \K\ - \h, + oi\h\). 
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We consider an asymptotics 

UW^oo such that ^k/U\\=Uk, ||e||v^ = 0(l). 
Since Zkit + h)=^k + O(v^) = ^.(1 + we have 



Y{t + h) 



Y,Zk{t + hy 

\ k=l 



mv 



ii^ii {i + E ^^^^^^^^^^(1 + oim + om 



In this expression, we used 

Zk{t + hf - il = 2UZk(t + h)- + 0{\h\)) = 0(1) 



and ik{Zk{t + h) - 



0{\h\). Next, consider a conditional random field with 



the index h defined by ||^|| + /i) — ||^|| } . The leading terms of the mean and 
covariance function of this field are show to be 

- E UW^HpM E U\KJ2p'^^(\h^\ + \K\ - \h^ - K\), (4.6) 

k i k i 

respectively. 

From now on, let t = j^D and h = {j — j^)D in the multi-index notation of (12.41) . 
and consider the following (finite dimensional) joint distribution under the condition that 
Z{fD) = e 



H^j^)- 11^11} 



, J = {ji,...,3p) e J cZ^. 



(4.7) 



When 



11^11,6 oo, A such that ||^|| ~ &, 6a/A q G (0, oo), 
from (14. 6p . the limit of the conditional mean is 

- E ^fe E P'^'^'i I = ~ E P'^'i I 
k i i 

with Pi = piiu) defined in (12.61) . and the limit of the covariance between 5|F(jD) — ||^||} 
and h{Y{j'D)-\\i\\] (/ = (jI, . . . , j^)) is 

E^^E^'^^^i'^l-^'^l + l-^f ~ = ^PkicKlM + \fi\ - \ji-fi\) 

i 

2 Pkicl min( I ji | , | j,'] ) {ji and have the same sign) , 
(otherwise) . 
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Since the limit becomes Gaussian again, the hmiting distribution of (14. 7p is equivalent 
to the distribution of 



i=l 

where 



Xa + --- + Xu (t>0), 

(otherwise) , 

' Xi,_i + ■ ■ ■ + Xi^t (t<o), 
(otherwise) , 



with Xit ~ N{—piC^,2piC^) {i = 1, . . . ,p, t E Z) being independent Gaussian random 
variables. 

Summarizing the discussion above, we have proved that ioi y = \\C,\\ = b + x/b b, 

p( max Y{jD) < b \ Z{fD) = yu] 
ViGJ"(i") / 

= p( max 6{r(jZ}) -e} < -x \ ZifD) = A 

^ p( max V Si < -x] . (4.8) 

^ 1=1 ' 

In what follows, let j := j — j° for simplicity, j G J'^ij^^ is rewritten as j G J°(0). Let 



Because of 



max = max 

ieJO(o) 



the event 



Mf = maxSij, M- = maxSij. 

j>0 j<0 



max , max , • • • , max 

ii>o,i2,---,ipez ji=o,j2>oj3,...,jp£Z ji=j2=-=jp-i=o,jp>o 



max > S'j < —X (4.9) 



is equivalent to the event that all of the following inequalities hold: 

+ max{M^, M^} + max{M;l, M^} + ■■■ + max{M+, M~} < -a;, 
+ max{M3+, M3"} + ■ ■ ■ + max{Mp+, M^} < -x, 



M+ < -X. 
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Since M- > 0, if both 



M+ + max{M+ 1, J + ■ ■ ■ + max{M+_i, M". J + M" < -x 



and Mp*" < —x hold, then 

M+ + max{M+ 1, M^J + ■ ■ ■ + max{M+_i, MplJ + M+ 

< -a; - + Mp+ 

< — 2x < —X 

holds. This implies that 

M+ + max{M+ 1, i^" J + ■ ■ ■ + max{M+_i, Af-.J + max{M+, Mp"} < -x. 

Therefore, (14 .Op is equivalent to the event that all of the following hold: 

M+ + max{M2+, M^} + ■ ■ ■ + max{M+_i, MplJ + < -x, 
M+ + ■■■ + max{M+_i, M~_^} + M' < -x, 

M+ < -X. 

Repeating this argument reveals that (14 .Qp is equivalent to the event that all of the 
following inequalities hold: 

M+ + + Mg- + ■ ■ ■ + M- < -X, 
M+ + + ■■■ + M„ < -X, 



MJ < -X. 



That is, 



(USD ~ p(m+ + + ■ ■ ■ + M- < -X, \ <i<v^ 

= P(max(M+ + + ■ ■ ■ + M;) < -x). (4.10) 



i<p 

Because the mean /Xj and variance af of Xj^ satisfy 

af - 2p,c^ - 2' 

it follows for any p > 1 that 



^ e-'^P max (M+ + Mr;^ + . . . + m^") < -x) rfx 

m m 

= JjAij'^lyUi/o-i) = Ylpic'^^^ici^/2pl). (4.11) 



i=l i=l 

A proof is given below. Combining (14. 8p . (I4.10p and (14. lip yields (14. 5p . 
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4.2.3 Proof of (HOlD 

Note that M^,M{, . . . , M+, M~ are all independent. A proof of p = 1 is given by 



Siegmundl (Il992[). L emma 19. For p > 2, from the integration by parts essentially proved 



by ISiegmundl (119921 ). Proposition 24, we have 

RHS of fim]) 

= e'^l^max (M+ + M~^^ H h M~) < -x^ dx 

= e-^'P max ^(M+ + M^^ + ■ ■ ■ + M') < -x) P (^M+ < -x) dx 
= ^pv{2^p/ap) [ e''^p( max (M+ + M' + ■ ■ • + M„ J < ~x] dx. 
The proof follows from mathematical induction. 

4.3 Proof of Theorem [272] 

4.3.1 Random fields defined by triangulation 

First, we discuss in detail the construction of by triangulation of index set. It is well 
known that a p-dimensional cube [0, 1]^ can be dissected into congruent p! simplices. For 
example, let Hp be the set of all permutations of {1, . . . and for each tt G Hp let 

St, = {(Xi, ...,Xp)e [0, 1]^ I X^(i) > ■ ■ ■ > X^(p)}. 

Then, [0, 1]''' = Uyren ^n, and S-,, and S^' (vr ^ n') do not share any interior point. 
We dissect the p-dimensional rectangle whose vertices are flanking lattice points 

[dij^-i, dij^] X ■ ■ ■ X [rfpjj,_i, dpj^] 

into p\ simplices according to the same rule. Let Cj G M*^ be a vector whose elements are 
all except for the ith element of the value 1 . Write 

^0 = (^iji-i) • • • ) ^pip-i); Di = Dij. = tij- — tij-_i {i = 1, . . . ,p) 
for simplicity. Then, one of the resulting simplices produced by the dissection is 

i 

conv^^to + Y,Diei | i = 0, 1, . . . (4.12) 

1=1 

Let 

i 



1=1 
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be the values of the random field at the p + 1 vertices of the simplex fl4.12p . This is a 
Gaussian random vector with a mean and a covariance matrix 



/I T\ T1T2 

1 r2 
1 

V 



'riT2T3 ■■■Tp\ 

n-'-Tp 



J 



(4.13) 



(p+l)x(p+l) 



where = Cov {Z kit), Z kit + DiCi)) = RkiiDi). (Although ^ and Xj depend on k, we 
omit the index k for simplicity.) We can define the random field Z^ by interpolating the 
random vector C, into the simplex (14.121) . To be precise, by the affine bijection map from 
the canonical p-dimensional simplex 

AP = conv{0, ei, . . . , Cp} = |s G I < s„ ^ Si < l| 



to the simplex (14.121) . we can introduce a parameter (local coordinates) s = (sj) into 
(I4.12p . and define a Gaussian random field by 



Zkis) 



where 



■Si; Si, . . . , Sp 



is) = vV(^Fs^, ifis) = (1 - E 

i 

is the normalizing constant so that the variance of Zk{s) is 1. 

4.3.2 Volume of the index set of the chi-square random fields 

The volume of the index set T x S"*^! can be obtained by summing up the volumes of 
the index sets x for the Gaussian random fields 



m 



X{s,u) = J2'^kZk{s), (s, m) G AP X §™-\ 

k=l 



Let u = uiOa) be a local coordinate of S"* ^ . Partial derivatives with respect to Sj and 
9a are denoted by di and da-, respectively. The covariance matrix of 

m m 

diX{s, m) = E UkdiZk{s), daX{s, m) = ^ daUkZk{s) 



k=l 



k=l 



IS 



YlT=i'^k9k,ij{s) 



gabiu, 
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where 

gk,ij{s) = E[diZkis)djZkis)], gah{u) = ^daUkdbUk- 

k=l 

Hence, the volume of the index manifold x is 



VoKA^' X §™-i) = [ Cis,u) 



where 

1/2 

C{s,u) = det (y^^ulgk,ij{s)) ]^rfsiC?M, du = det[gab{u)y^'^ Y^^-Oa 

k=l i a 

is the volume element. 

We consider the case where Di ~ 0, or equivalently ~ 1, in S ( I4.13p . Let J be the 
(p + 1) X (p + 1) matrix whose elements are all 1. Then, 

S = J — Si + 0(max 1 1 — Tjp), 

where Si is a symmetric matrix such that 

i-i 

(Si),, = 0, (Si),, = ^(l-rO (z<j)- 

i=i 

By using the covariance function 



rk{s,s') = Cov(Zfc(s),Zfc(s')) 



the metric of the index set is induced by 

d'^his, s') 



9kis) = igk,ijis))l<i,j<d, gk,ijis) Qg.Qg', 

Simple calculations yield 

9k,ij - ^TS<^ (<^"^S(^)2 ' 

^^ = = (-1, 0^_^, 1, 0^_^^. 

i—l p—i 

Abbreviating 0(max |1 — rj|) as O yields 

(f~^T.(f =Lp^ Jip + = 1 + 0, Lp^TjLpj = if^ Jifj + = 0, 
ifjj^ifj =(fjjfj - 'pj^ifj + 0^ = -fJ^i^Pj + O^ 

= — (^l)ll + + ~ (Sl)i+lj+l + O 

E;=i(1 - n) + ELi(l - n) - EUi(l - n) + 0' (z < j), 
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and 

min(ij) 



|2 J2 (l-rO}(l + 0(max|l-r,|)). 



By substituting Xj = 1 — pkiDi + o{Di), we obtain 

min(jj) 



1=1 

Some simple calculations yield 



(2 J2 Pfc/A) (1 + 0(1)) (maxA^O). 



1/2 min(«J) m 

det(^«2^,,,(s)) =det(2 J2 (E^^'^^O^O (^ + ^^^)) 

/=1 A;=l 



k=l 1=1 k=l 



1=1 1=1 



where Pi{u) is defined in fl2.6p . Combined with 



we obtain the volume of the index set x as 

p- i=i -^s^-i 

By letting A := D^., and summing up (14.141) with respect to ji = {i = 

1, . . . ,p), we can show that the volume of T x is 

P rii 

Vol(T X W^-^) = 2^/2^7 JJ(^^Dj/2j(l + 0(1)). 

i=i j=i 

By substituting this into ( I2.10p . we obtain the tube formula (12. lip for the probability 
P(maxjg^r(t) > h). 
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(a) T=[0.lf,df = 4 (b) D|js0.01,df = 4 





Figure 2.1: Comparisons of upper probability formulas. (a) T — [0, 1]^, 
Dij = 0.05, 0.01, 0.002 (equally spaced), degrees of freedom m — A; (h) T — 
[0,0.5]2, [0, 1]^ [0,2]2], Dij = 0.01 (equally spaced), m = 4; (c) m = 1, other 
conditions are the same as (a); (d) T — [0,1]^, Aj = 0.01 (equally spaced), 
(0.5,1,0.5,1,3,0.5,1,0.5, 1,1,... )/100 (pattern I), (0.5, 0.5, 3, 0.5, 0.5, . . .)/100 (pattern 
II), m = 4 
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